785 research outputs found
Supervised Random Walks: Predicting and Recommending Links in Social Networks
Predicting the occurrence of links is a fundamental problem in networks. In
the link prediction problem we are given a snapshot of a network and would like
to infer which interactions among existing members are likely to occur in the
near future or which existing interactions are we missing. Although this
problem has been extensively studied, the challenge of how to effectively
combine the information from the network structure with rich node and edge
attribute data remains largely open.
We develop an algorithm based on Supervised Random Walks that naturally
combines the information from the network structure with node and edge level
attributes. We achieve this by using these attributes to guide a random walk on
the graph. We formulate a supervised learning task where the goal is to learn a
function that assigns strengths to edges in the network such that a random
walker is more likely to visit the nodes to which new links will be created in
the future. We develop an efficient training algorithm to directly learn the
edge strength estimation function.
Our experiments on the Facebook social graph and large collaboration networks
show that our approach outperforms state-of-the-art unsupervised approaches as
well as approaches that are based on feature extraction
BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking
Data generation is a key issue in big data benchmarking that aims to generate
application-specific data sets to meet the 4V requirements of big data.
Specifically, big data generators need to generate scalable data (Volume) of
different types (Variety) under controllable generation rates (Velocity) while
keeping the important characteristics of raw data (Veracity). This gives rise
to various new challenges about how we design generators efficiently and
successfully. To date, most existing techniques can only generate limited types
of data and support specific big data systems such as Hadoop. Hence we develop
a tool, called Big Data Generator Suite (BDGS), to efficiently generate
scalable big data while employing data models derived from real data to
preserve data veracity. The effectiveness of BDGS is demonstrated by developing
six data generators covering three representative data types (structured,
semi-structured and unstructured) and three data sources (text, graph, and
table data)
Time-Varying Graphs and Dynamic Networks
The past few years have seen intensive research efforts carried out in some
apparently unrelated areas of dynamic systems -- delay-tolerant networks,
opportunistic-mobility networks, social networks -- obtaining closely related
insights. Indeed, the concepts discovered in these investigations can be viewed
as parts of the same conceptual universe; and the formal models proposed so far
to express some specific concepts are components of a larger formal description
of this universe. The main contribution of this paper is to integrate the vast
collection of concepts, formalisms, and results found in the literature into a
unified framework, which we call TVG (for time-varying graphs). Using this
framework, it is possible to express directly in the same formalism not only
the concepts common to all those different areas, but also those specific to
each. Based on this definitional work, employing both existing results and
original observations, we present a hierarchical classification of TVGs; each
class corresponds to a significant property examined in the distributed
computing literature. We then examine how TVGs can be used to study the
evolution of network properties, and propose different techniques, depending on
whether the indicators for these properties are a-temporal (as in the majority
of existing studies) or temporal. Finally, we briefly discuss the introduction
of randomness in TVGs.Comment: A short version appeared in ADHOC-NOW'11. This version is to be
published in Internation Journal of Parallel, Emergent and Distributed
System
Stable and Efficient Structures for the Content Production and Consumption in Information Communities
Real-world information communities exhibit inherent structures that
characterize a system that is stable and efficient for content production and
consumption. In this paper, we study such structures through mathematical
modelling and analysis. We formulate a generic model of a community in which
each member decides how they allocate their time between content production and
consumption with the objective of maximizing their individual reward. We define
the community system as "stable and efficient" when a Nash equilibrium is
reached while the social welfare of the community is maximized. We investigate
the conditions for forming a stable and efficient community under two
variations of the model representing different internal relational structures
of the community. Our analysis results show that the structure with "a small
core of celebrity producers" is the optimally stable and efficient for a
community. These analysis results provide possible explanations to the
sociological observations such as "the Law of the Few" and also provide
insights into how to effectively build and maintain the structure of
information communities.Comment: 21 page
Kronecker Graphs: An Approach to Modeling Networks
How can we model networks with a mathematically tractable model that allows
for rigorous analysis of network properties? Networks exhibit a long list of
surprising properties: heavy tails for the degree distribution; small
diameters; and densification and shrinking diameters over time. Most present
network models either fail to match several of the above properties, are
complicated to analyze mathematically, or both. In this paper we propose a
generative model for networks that is both mathematically tractable and can
generate networks that have the above mentioned properties. Our main idea is to
use the Kronecker product to generate graphs that we refer to as "Kronecker
graphs".
First, we prove that Kronecker graphs naturally obey common network
properties. We also provide empirical evidence showing that Kronecker graphs
can effectively model the structure of real networks.
We then present KronFit, a fast and scalable algorithm for fitting the
Kronecker graph generation model to large real networks. A naive approach to
fitting would take super- exponential time. In contrast, KronFit takes linear
time, by exploiting the structure of Kronecker matrix multiplication and by
using statistical simulation techniques.
Experiments on large real and synthetic networks show that KronFit finds
accurate parameters that indeed very well mimic the properties of target
networks. Once fitted, the model parameters can be used to gain insights about
the network structure, and the resulting synthetic graphs can be used for null-
models, anonymization, extrapolations, and graph summarization
Analysis of the Web Graph Aggregated by Host and Pay-Level Domain
In this paper the web is analyzed as a graph aggregated by host and pay-level
domain (PLD). The web graph datasets, publicly available, have been released by
the Common Crawl Foundation and are based on a web crawl performed during the
period May-June-July 2017. The host graph has 1.3 billion nodes and
5.3 billion arcs. The PLD graph has 91 million nodes and 1.1
billion arcs. We study the distributions of degree and sizes of strongly/weakly
connected components (SCC/WCC) focusing on power laws detection using
statistical methods. The statistical plausibility of the power law model is
compared with that of several alternative distributions. While there is no
evidence of power law tails on host level, they emerge on PLD aggregation for
indegree, SCC and WCC size distributions. Finally, we analyze distance-related
features by studying the cumulative distributions of the shortest path lengths,
and give an estimation of the diameters of the graphs
Shortest path discovery of complex networks
In this paper we present an analytic study of sampled networks in the case of
some important shortest-path sampling models. We present analytic formulas for
the probability of edge discovery in the case of an evolving and a static
network model. We also show that the number of discovered edges in a finite
network scales much slower than predicted by earlier mean field models.
Finally, we calculate the degree distribution of sampled networks, and we
demonstrate that they are analogous to a destructed network obtained by
randomly removing edges from the original network.Comment: 10 pages, 4 figure
Co-community Structure in Time-varying Networks
In this report, we introduce the concept of co-community structure in
time-varying networks. We propose a novel optimization algorithm to rapidly
detect co-community structure in these networks. Both theoretical and numerical
results show that the proposed method not only can resolve detailed
co-communities, but also can effectively identify the dynamical phenomena in
these networks.Comment: 5 pages, 6 figure
The Dynamics of Viral Marketing
We present an analysis of a person-to-person recommendation network,
consisting of 4 million people who made 16 million recommendations on half a
million products. We observe the propagation of recommendations and the cascade
sizes, which we explain by a simple stochastic model. We analyze how user
behavior varies within user communities defined by a recommendation network.
Product purchases follow a 'long tail' where a significant share of purchases
belongs to rarely sold items. We establish how the recommendation network grows
over time and how effective it is from the viewpoint of the sender and receiver
of the recommendations. While on average recommendations are not very effective
at inducing purchases and do not spread very far, we present a model that
successfully identifies communities, product and pricing categories for which
viral marketing seems to be very effective
- …